AncestryPainter is a graphing tool developed by Qidi Feng and Dongsheng Lu in 2018 (https://doi.org/10.1016/j.gpb.2018.05.002). It can visualize the ancestry composition and genetic difference, and merge ancestry proportion matrix output by ancestry inference tools like ADMIXTURE.
Since Aug, 2021, a new version of AncestryPainter has been developed by Shuanghui Chen with her collaborators. Now this project is still ongoing.
See DESCRIPTION for more details about our team!
R 3.3.3 "Another Canoe".
The graphing and statistical functions of AncestryPainterV2 are achieved by invoking these attached base packages of R.
graphics
grDevices
stats
utils
Generally, if you install R on your device, it is no
need to install or load these packages additionally.
You can install our software by an R tool “devtools” given good network connectivity.
> if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("Shuhua-Group/AncestryPainterV2")
Alternatively, you can obtain the compressed package manually, and install it by R command like:
install.packages("/path/to/the/package/AncestryPainterV2.x.y.tar.gz")
Substitute “x” or “y” by the version ID of the
AncestryPainterV2 binary package you downloaded from GitHub
or other source.
To check whether it is successfully installed, try loading this package like
library("AncestryPainterV2")
This part includes functions and parameters of AncestryPainter. Main parameters that can be used most frequently, are further explained, including those regarding input data, layout of plot, major plotting elements, etc.
We strongly suggest that users save information used as arguments, like ancestry proportion/genetic distance/individual and group/color code, in tab/space-delimited text files and read these information into R environments.
For more details, please refer to the R documentation (in “.Rd” files
in the man folder, or search it by “Help” on RStudio) of
AncestrypainterV2.
sectorplot(Q, ind, target = NULL, poporder = NULL, ancescols = NULL,
sorting = FALSE, rmin = 2, rmax = 3.7, tar.r = 0.6,
tarangs = NULL, cendis = 1, amin = -265, amax = 85,
tarang1 = 0, tarang2 = 180, arrow = FALSE, legend_mode = FALSE,
ancesnames = NULL, prgap = 0.2, noline = FALSE,
pop.lab.cex = NULL, pop.lab.col = "black", pop.lab.font = 1,
tar.lab.cex = 6, tar.lab.col = "navy", arrow.col = "red",
arrow.lwd = 2, legend.pos = "topright")
Q
A numeric data frame of ancestry proportion (columns: ancestry
component; rows: individual). e.g., an output “.Q” file of the software
ADMIXTURE.
ind
A two-column data frame (1: population; 2: individual).
target
Character. The target populations to be plotted as a pie chart in the
center of the circle figure. The population must be included in input
“ind” and “Q” data frame.
poporder
Character. The populations to be included in the figure, also the
display order of the populations in the figure.
ancescols
The color code of each ancestry component in the figure.
sorting
A logical value to define whether to sort the order of the populations,
which will be masked if “poporder” is specified.
rmin
The radium of the inner ring. Default is 2.
rmax
The radium of the outer ring. Default is 3.7.
tarangs
A numeric vector. The angles of the target pie charts.
cendis
A numeric vector. The distance from the center of a target pie chart to
the center of the sectorplot. Default: 1.
amin
The angle at which the ring is initiated. Default is -265.
amax
The angle at which the ring is ended. Default is 85.
tarang1
The start angle of the target layout. Default is 0.
tarang2
The end angle of the target layout. Default is 360.
arrow
Logical. Whether to draw the arrows to the target pies.
legend_mode
Logical. Whether to draw the legend of ancestry components.
ancesnames
Character. To specify the names of ancestry components. If not
specified, would be shown as “Ancestry_1” “Ancestry_2” and so on.
noline
Logical. Whether to remove the black lines between populations. Default
is FALSE.
radiationplot(data, target = "target", sorting = FALSE,
layers = NULL, num = 4, digits = 2, cenvals = c(0.5, 0.5),
border = 0.3, amax = -250, amin = 70, rstart = 0.02,
flat = 1.2, label_mode = TRUE, pop.lab.cex = 0.7,
pop.lab.font = 1, pop.lab.col = "navy", legend_mode = FALSE,
legend.pos = "topright", legend.lwd = 5, legend.text.cex = 1,
ring.text.col = "black", ring.text.cex = 1, ring.text.font = 1,
ring.line.col = "gray", tar.lab.col = "black", tar.lab.cex = 1.1,
tar.lab.font = 1, core.line.col = "black")
data
A four-column data frame. 1: population label (character); 2: region
label (character); 3: genetic difference (numeric); 4: color code.
target
A character value to pass name of the target population. Default is
“target”.
sorting
Logical. Whether to sort population order according to their genetic
difference. Default is FALSE.
layers
Numeric values of layers. This parameter will mask “num” if
specified.
num
An integer. Default is 4. Layer number.
digits
A float number. The decimal space that the genetic distance should be
rounded to. Default is 2.
amax
Numeric. Maximal angle of radiation bars. Default is -250.
amin
Numeric. Minimal angle of radiation bars. Default is 70.
rstart
A numeric value of radius. Default is 0.02.
label_mode
Logical. Whether to print the population labels. Default is TRUE.
legend_mode
Logical. Whether to draw the legend of region information. Default is
FALSE.
The naming style of this group of parameters imitates what is used in the R package “ggplot2”. The name of an parameter includes two or three parts. The prefix indicates a certain subject in the plot (e.g., “tar” “legend”), and the suffix, the feature that can be modified (e.g., “col”“pos”“font”“cex”). In some cases, there is a middle part between the two parts, which means a graphing element of the subject.
For example, the parameter ring.text.col of
radiationplot indicates the color of the text annotating
outer rings
Again, see the R documentation (in “.Rd” files in the
man folder, or search it by “Help” on RStudio) of
AncestryPainterV2 for more details!
ancmerge(tar_anc_filelist, ref, K, poporder = NULL)
tar_anc_filelist
Character. Required. ancestry file names (recommended name format:
prefix.K.ancestry). The ancestry matrix file should be(2 + K) columns
without header. The columns: 1st-Individual ID; 2nd-Group ID. From the
3rd column, it indicates the ancestry proportion.
ref
Character. Required. The reference ancestry matrix to be matched.
K
Integer. Required. The number of the ancestry components.
poporder
Character. (optional, input files) Population order list.
AncestryPainter implements sectorplot to visualize the
ancestry composition of multiple populations.
The users of our software have to provide 1) an ancestry matrix with rows as individuals and columns as ancestry proportion and 2) the annotation including individual ID and group ID. Information for 1) and 2) can be saved in a tab/space delimited text files “.Q” and “.ind”, respectively.
> # input
> exp_q <- read.table("./inst/extdata/exp_ances.K8.1.Q", header = F)
> exp_ind <- read.table('./inst/extdata/exp_ances.K8.1.ind', stringsAsFactors = F, header = F)
> exp_q[201:205, 1:5]
V1 V2 V3 V4 V5
201 0.113863 0.071362 0.018369 0.012654 1e-05
202 0.149520 0.068620 0.017565 0.017855 1e-05
203 0.129546 0.076118 0.054619 0.015374 1e-05
204 0.229236 0.069375 0.030631 0.009024 1e-05
205 0.763557 0.115103 0.001776 0.011903 1e-05
>
> head(exp_ind)
V1 V2
1 Australian IHW9195
2 Australian IHW9193
3 Australian IHW9118
4 Chukchi ADR00060
5 Chukchi ADR00065
6 Chukchi MC_06
Users can specify the color code of ancestry components if they like. If not, the colors will be randomly generated.
You can input the data manually. However, it would be convenient if the data are saved in files and we strongly recommend that you do so.
Now let’s start with a toy sample. Let’s specify the color code of ancestry.
> exp_cols <- read.table('./inst/extdata/exp_ances.K8.1.color', stringsAsFactors = F, header = F)$V1
Note that the figure may be so large that it cannot be shown in the “Plots” Session of RStudio on some devices. The error “Figures margin too large” may appear. To get over this, you’d better print your plot in graph files (“.pdf”, “.png”, “jpg”, etc), like
> # Graphing
> pdf("exp_ances.8.basic.pdf", width = 45, height = 45)
> sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols)
No population order specified.
> dev.off()
An important feature of our software is to display the ancestry composition of the target population(s) by pie charts in the center of the plot. To make use of this feature, you can specify the “target” parameter. In contrast to Version 1, Version 2 supports multiple target pie charts.
> exp_tars <- c("Yoruba", "French", "Han")
> sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols, target = exp_tars)
You can modify the layout of the target pies via two parameters “tarangs” (which defines the angle between the line from the center of the sectorplot to the center of a target pie chart and the positive x-axis) and “cendis” (which defines the distance from the center of any target pie chart, like A, B, C in the figure below, to the center of the sectorplot).
Generally, you can input two numeric vectors with the same length. If they are not of the same length, the shorter one will be coerced to match the longer one by extension and a warning message will be thrown.
> sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols, target = exp_tars, tarangs = c(120, 240, 360), cendis = c(0.9, 0.9, 1.2))
To make it more convenient, you can use two parameters “tarang1” and “tarang2”. Given n target pie charts in the plot, the parameters “tarang1” defines the angle between the line from the center of the sectorplot to the center of the first target pie chart, and “tarang2”, that to the center of the last target pie chart. For the nth target pie chart, its center is located at the line with an angle as (tarang2 - tarang2) / (n - 1) from the positive x-axis.
Here shows an example of the usage of the pair of parameters “tarang1” and “tarang2”.
> sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols, target = exp_tars, tarang1 = 90, tarang2 = 330)
Specially, if you want a target pie to be located right at the center of the plot, you can specify “cendis” as zero. To show the corresponding position of target populations on the outer ring, you can add arrows to the targets. But remember to make your populations well sorted if you specify multiple targets.
> sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols, target = exp_tars[3], cendis = 0, arrow = T)
You can specify the population order, like
> exp_order <- read.table('./inst/extdata/exp_ances.K8.1.order', stringsAsFactors = F, header = F)$V1
> exp_order
[1] "Australian" "Chukchi" "Dai" "Eskimo"
[5] "French" "Han" "Inga" "Japanese"
[9] "Ju_hoan_North" "Kazak" "Kirgiz" "Koryak"
[13] "Mala" "Mbuti" "Onge" "Sardinian"
[17] "Sindhi" "Yoruba"
> sectorplot(Q = exp_q, ind = exp_ind, target = exp_tars, ancescols = exp_cols, poporder = exp_order)
Use the specified population order...
Note that when loading the population order from a text file, the “stringsAsFactors” should be set as FALSE to make it possible to sort the population order as specified, Otherwise, it will cause errors.
Alternatively, you can just let the software sort the populations.
> sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols, target = exp_tars, sorting = T)
No population order specified.
Sort the population order...
To show the type of ancestry components, you can use a legend. You can define the ancestries by yourself or just let the software name them “Ancestry_1” “Ancestry_2” and so on.
> exp_anc <- c("SA", "OC", "WE", "AM", "AF1", "SIB", "EA", "AF2")
> sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols, target = exp_tars, tarang1 = 90, tarang2 = 330, sorting = T, legend_mode = T, ancesnames = exp_anc)
We provide some parameters to help you make your graph more beautiful.
You can modify the font, size, and color of the target/population labels, the position of the legend, and so on. Note that the color of the population labels can be specified as a vector with a length the same as the number of the populations so that the populations from different groups can be indicated by different colors.
> exp_info <- read.table('./inst/extdata/exp_ances.K8.1.info', header = T, row.names = 1)
> head(exp_info)
Region Latitude Longitude
Australian Oceania -13.0 143.0
Chukchi CentralAsiaSiberia 69.5 168.8
Dai EastAsia 21.0 100.0
Eskimo CentralAsiaSiberia 64.5 172.9
French WestEurasia 46.0 2.0
French_South WestEurasia 43.4 -0.6
> exp_info$popcol <- sapply(exp_info$Region, FUN = function(x) ifelse(x == "Africa", "darkred", "cyan4"))
sectorplot(Q = exp_q, ind = exp_ind, ancescols = exp_cols, target = exp_tars, tarangs = c(90, 210, 330), poporder = exp_order, legend_mode = T, ancesnames = exp_anc, legend.pos = "topleft", tar.r = 0.45, tar.lab.col = "orange", pop.lab.col = exp_info$popcol, pop.lab.cex = 4)
radiationplot can be used to visualize the genetic
distance from one target population to other populations. The code was
adapted from what was used in a study of the genetic history of Tibetan
highlanders (Lu, et al, 2016).
To use this plot, a four-column matrix containing the information of populations, regions, genetic difference, and color code is required. These information can be saved in a tab/space-delimited text file.
> # input
> exp_fst.local <- read.table("./inst/extdata/exp_fst.local.txt", stringsAsFactors = F, header = F)
> exp_fst.local
V1 V2 V3 V4
1 Egyptian Africa 0.0497950 blue
2 Ethiopian_Jew Africa 0.0729730 blue
3 Esan Africa 0.1479100 blue
4 Even CentralAsiaSiberia 0.0090195 cyan
5 Eskimo CentralAsiaSiberia 0.0596530 cyan
6 Estonian WestEurasia 0.0478740 yellow
7 English WestEurasia 0.0499280 yellow
> # Graphing
> pdf("exp_fst.local.8.basic.pdf", width = 10, height = 10)
> radiationplot(data = exp_fst.local, target = "TAR0")
> dev.off()
To make the populations sorted in an order according to their genetic distance to the target, you can use the flag “sorting”.
You can change the number of the outer layers indicating the genetic distance via the parameter “num”, and make the values on the ring rounded to a certain decimal space via “digits”.
> radiationplot(data = exp_fst.local, target = "TAR0", sorting = T, num = 5, digits = 3)
You can also specify the values of the outer layers directly by the parameter “layers”.
> radiationplot(data = exp_fst.local, target = "TAR0", sorting = T, layers = c(0.03, 0.06, 0.09, 0.12))
If populations are from several regions and you would like to show
the information, you can switch the “legend_mode” flag to
TRUE, and color the radiation bars according to their
region info.
> radiationplot(data = exp_fst.local, target = "TAR0", sorting = T, layers = c(0.03, 0.06, 0.09, 0.12), legend_mode = T)
radiationplot uses similar parameters for aesthetics as
sectorplot. For more details, please refer to the R
documentation.
This function is translated from a part of the python code authored by Yuwen Pan (https://github.com/Shuhua-Group/ADMIXTURE.merge). This function merges the ancestry proportion matrices estimated by software like ADMIXTURE with the same dataset and the same ancestry component number (K).
Here, the input ancestry matrices are loaded into the R environment
by the R base function list.files.
The input file is tab/space-delimited, and contains (2 + K) columns, in which the first two columns are IDs of individuals and groups and the rest indicates the proportion of K ancestry components. The input file can be constructed by merging the information in “.Q” and “.ind” files.
> ancfiles <- list.files("./inst/extdata/", pattern = "[0-9]\\.ancestry", full.names = T)
> print(ancfiles)
[1] "./inst/extdata//ances.8.10.ancestry" "./inst/extdata//ances.8.1.ancestry"
[3] "./inst/extdata//ances.8.2.ancestry" "./inst/extdata//ances.8.3.ancestry"
[5] "./inst/extdata//ances.8.4.ancestry" "./inst/extdata//ances.8.5.ancestry"
[7] "./inst/extdata//ances.8.6.ancestry" "./inst/extdata//ances.8.7.ancestry"
[9] "./inst/extdata//ances.8.8.ancestry" "./inst/extdata//ances.8.9.ancestry"
> ancmat <- read.table(ancfiles[1], header = F)
> head(ancmat)
Then you can use the function ancmerge to analyze.
Remember to specify a reference ancestry matrix file as the target to
compare. Also, you need to tell AncestryPainter how many the ancestry
components in the matrices are.
> results <- ancmerge(tar_anc_filelist = ancfiles, ref = ancfiles[1], K = 8)
Time:
2023-01-22 20:00:05
Path:
/home/users/yourpath/AncestryPainterV2
Done.
ancmerge returns a list of four elements, containing the
merged ancestry proportion matrix, the supporting ratio of each ancestry
component, and the names of files consensus/conflicted with the
reference, respectively.
> # Output
> str(results)
List of 4
$ merged_ancestry :'data.frame': 2538 obs. of 10 variables:
..$ 1 : Factor w/ 2538 levels "20100320-1","20100320-2",..: 425 427 433 442 436 434 432 429 441 439 ...
..$ 2 : Factor w/ 215 levels "Egyptian","Moroccan_Jew",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ 3 : num [1:2538] 0.1166 0.1146 0.1005 0.1238 0.0822 ...
..$ 4 : num [1:2538] 0.1253 0.1139 0.0933 0.097 0.0975 ...
..$ 5 : num [1:2538] 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 ...
..$ 6 : num [1:2538] 0.00846 0.00001 0.01142 0.00701 0.00413 ...
..$ 7 : num [1:2538] 1.00e-05 4.88e-03 1.00e-05 1.05e-05 1.00e-05 ...
..$ 8 : num [1:2538] 0.00001 0.00543 0.00001 0.00001 0.00619 ...
..$ 9 : num [1:2538] 0.0867 0.0722 0.1053 0.0826 0.1153 ...
..$ 10: num [1:2538] 0.663 0.689 0.689 0.69 0.695 ...
$ supporting_ratio :'data.frame': 8 obs. of 4 variables:
..$ component : chr [1:8] "comp1" "comp2" "comp3" "comp4" ...
..$ represent_pop : chr [1:8] "Yoruba" "Hadza" "Ju_hoan_North" "Papuan" ...
..$ support_counts: int [1:8] 10 2 10 10 10 2 10 10
..$ support_ratio : num [1:8] 1 0.2 1 1 1 0.2 1 1
$ consensus_filelist: chr [1:2] "./inst/extdata//ances.8.10.ancestry" "./inst/extdata//ances.8.4.ancestry"
$ conflict_filelist : chr [1:8] "./inst/extdata//ances.8.1.ancestry" "./inst/extdata//ances.8.2.ancestry" "./inst/extdata//ances.8.3.ancestry" "./inst/extdata//ances.8.5.ancestry" ...
In the merged ancestry proportion data frame, the first two columns are individual IDs and group (population) IDs. And the rest of the columns in the data frame are the ancestry proportion matrix, which can be used as the input argument “Q” of “sectorplot”.
results$supporting_ratio shows the ancestry components,
their representative groups (with the largest ancestry proportion of the
corresponding component), the number of files supporting the reference
for each component, and the supporting ratio.
> results$supporting_ratio
component represent_pop support_counts support_ratio
1 comp1 Yoruba 10 1.0
2 comp2 Hadza 2 0.2
3 comp3 Ju_hoan_North 10 1.0
4 comp4 Papuan 10 1.0
5 comp5 Chane 10 1.0
6 comp6 Korean 2 0.2
7 comp7 Mala 10 1.0
8 comp8 Sardinian 10 1.0
> write.table(results$merged_ancestry, "inst/extdata/merged_ancestry.txt", sep = "\t", col.names = F, row.names = F, quote = F)
Pugach, et al have introduced a method to infer the admixture sequence of multiple ancestry populations by calculating correlation coefficients between ancestry components, which has been applied to our previous study of the Uyghurs in Xinjiang (Feng, et al, 2017). We are going to implement this method in the AncestryPainterV2 package while optimizing the calculation of the correlation coefficients.
If you use AncestryPainterV2 in your project, please cite
https://github.com/Shuhua-Group/AncestryPainterV2
If you have any questions or suggestions, welcome to contact us: chenshh@shanghaitech.edu.cn
Feng Q, Lu Y, Ni X, Yuan K, Yang Y, Yang X, Liu C, Lou H, Ning Z, Wang Y, Lu D, Zhang C, Zhou Y, Shi M, Tian L, Wang X, Zhang X, Li J, Khan A, Guan Y, Tang K, Wang S, Xu S. Genetic History of Xinjiang’s Uyghurs Suggests Bronze Age Multiple-Way Contacts in Eurasia. Mol Biol Evol. 2017 Oct 1;34(10):2572-2582. doi: 10.1093/molbev/msx177. PMID: 28595347.
Feng Q, Lu D, Xu S. AncestryPainter: A Graphic Program for Displaying Ancestry Composition of Populations and Individuals. Genomics Proteomics Bioinformatics. 2018 Oct;16(5):382-385. doi: 10.1016/j.gpb.2018.05.002. Epub 2018 Nov 22. PMID: 30472416; PMCID: PMC6364040.
Lu D, Lou H, Yuan K, Wang X, Wang Y, Zhang C, Lu Y, Yang X, Deng L, Zhou Y, Feng Q, Hu Y, Ding Q, Yang Y, Li S, Jin L, Guan Y, Su B, Kang L, Xu S. Ancestral Origins and Genetic History of Tibetan Highlanders. Am J Hum Genet. 2016 Sep 1;99(3):580-594. doi: 10.1016/j.ajhg.2016.07.002. Epub 2016 Aug 25. PMID: 27569548; PMCID: PMC5011065.
Pan Y, Zhang C, Lu Y, Ning Z, Lu D, Gao Y, Zhao X, Yang Y, Guan Y, Mamatyusupu D, Xu S. Genomic diversity and post-admixture adaptation in the Uyghurs. Natl Sci Rev. 2021 Sep 11;9(3):nwab124. doi: 10.1093/nsr/nwab124. PMID: 35350227; PMCID: PMC8953455.
Pugach I, Matveev R, Spitsyn V, Makarov S, Novgorodov I, Osakovsky V, Stoneking M, Pakendorf B. The Complex Admixture History and Recent Southern Origins of Siberian Populations. Mol Biol Evol. 2016 Jul;33(7):1777-95. doi: 10.1093/molbev/msw055. Epub 2016 Mar 18. PMID: 26993256; PMCID: PMC4915357.